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Abstract 

The truncated plurigaussian model is often used to simulate the spatial distribution 
of random categorical variables such as geological facies. The problems addressed in 
this paper are the estimation of parameters of the truncation map for the truncated 
plurigaussian model. Unlike standard truncation maps, in this paper a colored Voronoi 
tessellation with number of nodes, locations of nodes, and category associated with each 
node all treated as unknowns in the optimization. Parameters were adjusted to match 
categorical bivariate unit-lag probabilities, which were obtained from a larger pattern 
joint distribution estimates from the Bayesian maximum-entropy approach conditioned 
to the unit-lag probabilities. The distribution of categorical variables generated from the 
estimated truncation map was close to the target unit-lag bivariate probabilities. 

The predictive performance of the model is evaluated using scoring rules, and con¬ 
ditioning of the latent Gaussian fields to log-data is generalized for the case when the 
truncated bigaussian model is governed by a colored Voronoi tessellation of the trunca¬ 
tion map. 


1 Introduction 


Initially developed by Matheron et al. (1987), the categorical-valued truncated Gaussian model 
provides a method for simulating random categorical-valued fields with desired proportions and 
approximate transition probabilities from a latent Gaussian random field (GRF). The generality 
of the model was increased when it was extended to the truncated plurigaussian (TPG) model 
(Galli et al., 1994; Le Loc’h et al., 1994) in which simulation of a categorical variable is based on 
the values of an arbitrary number of GRFs. In fact, however, most plurigaussian applications 
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should more accurately be called bigaussian, as in most applications a categorical variable is 
assigned based on values of a random standard Gaussian pair. The function used to assign the 
categorical value is commonly rather simple so that it can be visualized by coloring the practical 
domain of M 2 with one color per category. The hyperparameters for the function that maps the 
GRFs to a categorical variable and the function itself are referred to as the truncation map. 
To simulate a categorical-valued random field in a bigaussian case, two GRFs are sampled. 
The spatial correlation within each of the GRF and possible correlation between those fields 
represent other model parameters. 

The categorical-valued TPG model has seen increasing application in a number of fields, 
mostly due to Gaussian distribution of the latent random fields which is useful in several data- 
assimilation applications. The attractiveness of the TPG model is also partly due to the ability 
to handle large model-size and partly due to the advances in modeling real geological fields and 
real data. One area with significant TPG application is in the characterization of heteroge¬ 
neous aquifers. Cherbunin et al. (2009) were able to reproduce complex categorical lithofacies 
geometry using the truncated bigaussian model for a contaminated aquifer in which category 
proportions varied with depth. A similar TPG aquifer characterization problem was described 
by Mariethoz et al. (2009) with porosity heterogeneity simulated as a GRF for each category. 


Perulero Serrano et al. (2012, 2014) assessed the suitability of the TPG model through im¬ 


proved aquifer flow simulation responses. Apart from aquifers, Emery (2010) provided a model 
for mineral proportion evaluation in an ore deposit. Another extensive area of TPG applica¬ 
tion is related to petroleum reservoir characterization. Fault facies modeling with TPG was 
provided by Fachri et al. (2013). Armstrong et al. (2011) include several complex reservoir ex¬ 


amples of primary diagenesis effects characteristic of carbonate sedimentary systems. Carrillat 


et al. (2010) compared the use of TPG on a giant carbonate oilfield with application of other 


methods. Albertao et al. (2005) and Al-Anezi et al. (2013) used TPG to describe the distribu¬ 
tion of facies for complex reservoir models while using varying category proportions computed 
from seismic data. 

The application of the TPG method to unconditional simulation (or conditional to a few 
observations) of categorical fields is often fairly straightforward once the model parameters have 
been estimated, but estimation of the model parameters can be difficult. Although it is clear 
that the TPG parameters should be estimated with respect to available static data (logs and 
cores), there is no general way to do this, and those data are typically not directly involved in 
the parameter optimization. If the observations are abundant, their derivatives, for example, 
experimental variograms, cross-variograms and proportions, can be computed. Otherwise, if 
the observations are sparse or are not representative, some categorical space relations might 
be estimated from other sources, often coming from geological studies of analogue fields. A 
simplified approach to estimation of the truncation map is to chose a truncation map from 
some benchmark set (Armstrong et al., 2011; Galli et al., 2006, among others). The choice of 


map from the truncation set is based mostly on allowing or not-allowing some category contacts, 
while classical thresholds of the truncation map have either vertical or horizontal orientation 
with a given mutual arrangement. The exact threshold values are then computed from category 
proportions. An advantage of this method of the truncation map parametrization is that it is 
trivial to adjust locally varying category proportions while the other parameters are fixed. To 
handle more complex contacts between categorical variables, Xu et al. (2006) increased the 
number of GRFs but retained the classical thresholding style. Allard et al. (2012) developed a 
more general approach to truncation map estimation using kernel regression methods to relate 
categorical variables to the auxiliary latent variables. The truncation map in their method 
is non-parametric, allowing complex contacts between categorical variables, but the method 
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requires joint observation of the categorical and the Gaussian variables at the well locations. In 
another approach which avoided the assumption of rectangular regions in the truncation map, 


Deutsch and Deutsch (2014) attempted optimization of the truncation map parametrized with 


Voronoi tessellation. One node was used for each category while optimization of node locations 
was split into a series of steps that operated on subsets of the data, for example, proportions 
and fixed-lag bivariate probabilities. 

In addition to estimating parameters of the truncation map, it may be necessary to esti¬ 
mate the parameters of the GRF such as the covariances of the latent fields and the possible 
correlation between them. These parameters must be estimated jointly with parameters of the 
truncation map, hence the problem of estimating the Gaussian covariance matrices is almost 
necessarily iterative. Covariance estimation usually assumes a fixed covariance model (known 
in advance), stationarity and geometric anisotropy. In the approach by Xu et al. (2006) it was 


possible to make an iterative adjustment for variogram ranges and angles. Kyriakidis et al. 


(19991) studied an accurate way to estimate Gaussian variograms in case of a univariate model 
with two categories separated by one threshold, based on experimental categorical variograms 
as input. 

A key requirement of the TPG method is to be able to condition the GRF to actual cate¬ 
gorical observations. This is necessary either because of the need to simulate categorical fields 
conditional to well observations, or to estimate the parameters of the truncation map condi¬ 
tional to the data. When the parameters of the TPG model are established, the conditioning of 
the Gaussian variables related to the categorical observations can be considered as a separate 
problem that is commonly solved with Gibbs sampler. For a large number of observations, 
however, the sampler encounters a problem related to moving search neighborhood application 
and covariance matrix inversion. This problem was addressed for the unconditional random 
fields by Lantuejoul and Desassis (2012) using a propagative version of the Gibbs sampler. The 


approach was adapted to the TPG fields with linear inequality constraints by Emery et al . 
(2014). 

The problem that is addressed in this paper is the need to reproduce geocellular models by 
multiple samples from the joint distribution of the categorical values (facies) to which petro¬ 
physical properties can be conditioned. The ensemble of sampled models can then be used 
to quantify the uncertainty in reservoir flow behavior. The method is based on the following 
assumptions. The prior model parameter distribution for the truncation map is based on a 
parametrization using colored Voronoi tessellations. This parametrization is more flexible than 
the classical truncation map parametrization, which is mostly based on proportions and on al¬ 
lowing (or not) facies contacts. The prior data parameter distribution ( following the notation 
by Tarantola (2005)) is based on the Bayesian maximum-entropy (BME) estimate of a small 
joint distribution of few observations (a pattern), used for regularization of the data, which 
itself consists of unit-lag bivariate probabilities. The goal is to estimate the parameters of a 
TPG model (or models) that provides categorical samples/patterns that are indistinguishable 
from the BME estimates. The comparison of the two models pattern distribution is made on 
the basis of expected frequencies from the BME estimate to observed frequencies in realizations 
from the truncated multivariate model. 

First, the truncation map parametrized as colored Voronoi tessellation is estimated while 
assuming that parameters of the GRF variograms are known. Second, to handle the problem 
of conditioning correlated Gaussian random variables to categorical observations when the 
relationship is governed by the Voronoi truncation map, we generalize the constrained version 
of the propagative Gibbs sampler by Emery et al. (2014). If a truncation map has complex 
form or isolated areas of the same category, the standard Gibbs sampler might not converge to 
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a likely state. The alternative sampler can more easily condition the correlated observation to 
the categories presented as unions of disconnected polygonal areas in the truncation map. The 
method proved to converge rapidly and allows faster sampling of multiple Gaussian random 
vectors that all give the correct conditioning. Third, the validation of the parameter estimates 
is based on the scoring rule performance as regards to a materialized event of joint categorical 
observations, mimicking possible geological data of consequent geological facies observations at 
well locations. 


2 Methods 


2.1 Methods: Truncation model estimation 

2.1.1 Prior information on model parameters 


The distribution of the categorical random vector is governed by the truncated plurigaussian 
model. For C being a finite set of facies categories, a truncation map with parameters 9 is a map 
Mg : M 2 H» C. The set of parameters 0 is given below for a particular type of parametrization. 
Mg maps each bivariate Gaussian realization (x, y ) E M 2 to the set C. When Gaussian vector 
realizations, 

x A = (x a , a E A), y A = (y a , a E A) (1) 


defined on a vector space dcM 2 (or E 3 ), are used instead, the map Mg(xA , ]Ja) = za represents 
a categorical vector realization z, obtained from component pair mapping, Mg(x a , y a ) = z a , a E 
A. Vector za = (z a , a E A) is a random realization of a random vector Za = (Z a , a E A), and 
xa, yA are realizations of the so-called latent variables Xa, Ya- 

Although the standard parametrization of a truncation map for a truncated bigaussian 
model is via threshold values of the latent Gaussian variables, the truncation map can alterna¬ 
tively be parametrized by a colored Voronoi tessellation (Du et ah, 1999). The prior information 
for the map Mg is conveniently specified in terms of a distribution for the number of nodes of 
the tessellation, T, node coordinates (y r , v T ) E E 2 and their categories ( T . We assume that 
the number of nodes T follows the Poisson law P /( (•), with specified mean of the Poisson dis¬ 
tribution, /i and that the location of each coordinate y T , v T , r = 1,..., T, has a prior standard 
normal distribution, while node categories ( T are independent and uniformly distributed over 
the set of categories C. It is assumed that C includes all possible categories of the field C. 
Thus, the set of parameters 6 includes all the above mentioned 


0 = (Xr, v T , Cr, t = 1,..., T). (2) 

Its prior distribution density take the following form. 

T i 

/© (#) = II 9{Xr)g{Vr)v ^, (3) 

where g(-) is the standard normal distribution density; and where \C\ denotes the cardinality of 
the set C. The mapping of each pair of latent Gaussian values (x a , y a ) to a categorical variable 
using the Voronoi tessellation of the truncation map is done as 


Mg(x a ,y a ) = Cr*, t* = argmin||(x a ,y a ) - (Xr,w)||- 


( 4 ) 





In other words, the mapping assigns the category at location a to be equal to the category of 
the node closest to the latent Gaussian pair related to this location. The covariance matrices of 
the latent GRFs Xa, Ya are assumed to be known. Different hypotheses about the covariance 
matrices structures could possibly be compared with the validation techniques discussed below. 
However, this work only shows the results for truncation map validation for specified covariance 
matrices. 

In this work Xa, Ya are assumed to be independent, but the estimation method equally 
could have been applied to the case where the two Gaussian random vectors are correlated. 
Without loss of generality, both Xa, Ya can have zero-mean distributions. The covariances of 
the Gaussian variables Cx(a, a') and Cy (a, a') are assumed to be functions only of a — a'. The 
choice of a truncation map parametrization is made as a trade-off between simplicity related 
to the computational cost and efficiency related to the model estimation given some data. The 



(a) Nodes with their categories 


(b) Coloring of E 2 with category (c) Truncation map representa- 
of the closest node tion in [—4,4] 2 


Figure 1: The representation and interpretation of a truncation map based on colored Voronoi 
tessellation 


representation and the interpretation of the truncation map is shown on the example Fig. [T] 
of a sample from a prior distribution with mean number of nodes equal to 8. Figure [IJa) 
gives the positions and the colors (categories) related to the nodes, together with the borders 
of the Voronoi cells. Then the categories are assigned to every value in R 2 , which results 
in coloring Fig. 0b). As far as every latent pair has standard Gaussian distribution, only 
the practical domain [—4,4] 2 is presented. The nodes are omitted in Fig. (3c), while the 
interpretation of the mapping of a latent pair (x, y) would be the color of the point with the 
coordinates (x,y). If greater simplicity of the partitioning is desired, other tessellations that 
reduce geometrical complexity might be considered. For instance, Voronoi tessellation with the 
Li metric limits the inclination of the tessellation borders to diagonal, vertical and horizontal. 


Another possibility is the T-tessellation (Kieu et al., 2013) which can be specified to have 


rectangular categorical areas, or even allow horizontal and vertical borders only. Even a simple 
coloring of cells with fixed thresholds can be considered. In terms of model estimation, all these 
types of parametrization could be used within the approach of this work, requiring only that 
the set of the prior distributions of the tessellation parameters are modified appropriately. 


2.1.2 Prior information on data parameters 

The model above is a probabilistic model. When the covariance functions Cx, Cy are fixed, the 
uncertainty in the model is due to the prior distribution on the parameters 6 and due to the 
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distribution of the latent random Gaussian vectors Xa, Ya- When, in addition, 6 are fixed, all 
categorical realizations Mq(Xa, Ya ) follow the same distribution, and the model randomness is a 
result only of different realizations xa,Va- When using stationary GRFs to generate categorical 
variables, the categorical field from the mapping function is also stationary. Because of the 
stationarity, the distribution of several zb,B C A might be experimentally available. Denote 
as Bh some bivariate vector ( 0 : 1 , 012 ) C A, such that ol\ — 02 = h. In this work, a particular 
case is considered. Distribution of zs h is assumed to be known for h = (1, 0) and h = (0,1) and 
some conventional unit length. It will be called bivariate unit-lag distributions. This is rather 
realistic case where several samples for the empirical distribution of zs h may be available along 
a vertical or a horizontal well, respectively. With no restriction, a three-dimensional case would 
have been treated similarly. The empirical probability of zs h is denoted 7 t(zb h )- 

The truncation map parameter estimation should be based on this data. In order for the 
bivariate distributions to be consistent with each others and to create further on a likelihood 
function to account for both of them, the distributions will be integrated in one joint distri¬ 
bution of a few observations (a pattern). The pattern distribution is then assumed correct in 
the truncation map estimation process. While the joint pattern distribution retains maximum 
uncertainty, the bivariate distributions represent marginal distributions for the pattern distri¬ 
bution. The pattern distribution is estimated even in case some of the bivariate probabilities 
are missing. The choice of pattern made in this work is a five-point neighborhood. It is de¬ 
noted B t e A, B* = (on,..., CK 5 ) is a sequence of length five, where the points form a five-point 
neighborhood, one point in the middle, and four surrounding points. The bivariate subsets of 
£>* with known distributions will be denoted B. The black frame in Fig. [ 2 ] shows the pattern 
and the frames of other colors show the known marginal bivariate distributions on B. The pairs 
of blue and light-blue elements, as well as the pairs of red and orange and elements have the 
same distribution, respectively due to stationarity. 


eB 


G B 



G B 


e B 

B* 


Figure 2 : A five-point patterns 5* with known marginal distribtions on elements of B 
Then the problem of finding a probability mass function (pmf) p(zb*) that maximizes the 
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entropy 


(5) 


H(p) = -J^pOb,) InpOsJ 


Z B* 


is addressed. This maximization problem is known as the Bayesian maximum-entropy (BME) 
principle. Here the distribution of the pattern B* is conditional to the marginal distributions 
on B. The entropy as given above is a measure of the amount of uncertainty represented 
by a probability distribution. Jaynes (11957) states that the BME estimate is the maximally 


noncommittal estimate with regard to missing information. 

The existence of an optimal distribution on B * is guaranteed as soon as the set of distri¬ 
butions that meet all constraints is nonempty. This is typically the case when all prespecified 
pmf derive from the same empirical distribution based on a training image on A or well data 
including categorical observations at a given lag. Even though, the optimal distribution is not 
necessarily unique, one optimal pmf p * on B* can be found in order to further proceed with 
maximum likelihood estimation of parameters 9 based on the data given by p*. The mathemati¬ 
cal manipulations below are simplified using the notation z = zb * • To compute the distribution 
that maximizes the entropy (Eq. ([5])) subject to the marginal distributions h(zb) for B E B, 
and the requirement that probabilities must sum to one, a Lagrangian is formed for p*, the 
Lagrangian is formed, 


L =-J2p(z) In p(z) + Y^ Y X ( z *) 

BeB ZBeC \B\ 


Y P(. Z Bi Z B *\b) - n(z B ) 

Z B*\B 


+ A 


_ 1 


( 6 ) 


where the X(z B )’s and A are Lagrange multipliers. Maximizing the Lagrangian leads to the 
linear system of equations 


dL 
dp(z ) 
dL 

d\(z B ) 

dL 


— — In p(z) — 1 + ^bgb M z b) + A — 0 
= £* b , xb p( z b, z b .\b) ~ *(zb) = 0; 


aX = £* p(2) - 1 = a 

The first set of equations gives that p* belongs to the exponential family 

p*(z) = e A_1 JJ e MzB l 

Moreover, using 1 = ^2p*(z), we obtain 


BgB 


P*(z ) = 


n p A ( z b) 
BgB C 


as well as a formula relating A to all other Lagrange multipliers 


A = 1 — In JJ e x(zB) \ . 

\ z BeB ) 


(7) 


( 8 ) 


(9) 


( 10 ) 
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Solving this nonlinear system of equations is not easy, but an alternative method is available. 
By analogy with an algorithm proposed by Deming and Stephan (1940) to estimate the entries 
of an array with prespecified marginal totals, the following iterative algorithm to estimate p* is 
proposed. 


Algorithm 1 Deming and Stephan algorithm outline 

(i) initialize p c ; 

(ii) generate 5 U(B)\ 

(iii) put p n (z ) = p c (z)tt(zb)/p c (zb) for each z E 

(iv) put p c = p n and goto (ii). 


Its principle is quite simple. At each iteration, a family of indices B E B is selected at 
random. Then the current distribution is updated to be assigned the prespecified 5-margin. 
Unfortunately this latter step affects all other marginal distributions. The algorithm must be 
made iterative so that all 5-marginals are respected in the long run. Eventually, this algorithm 
can be seen as a successive projections algorithm, one projection consisting of fixing a 5- 
marginal. Such a method has an interesting geometric interpretation. To fix ideas, let Mu be 
the set of all distributions on C^ B *\ with fixed 5-marginal 7 t(-#). Let also p be an arbitrary 
distribution on Then it can be shown that the distribution p* defined as 


satisfies 


p*(z) =p(z) 


7 r(z B ) 
p( z b) 


D(p*\\p) = min D(q\\p), 
q&M-s 

where D(q\\p) denotes the Kullback-Leibler divergence of q w.r.t. p: 


( 11 ) 

( 12 ) 


D(q\\p) = ^q(z) In 


Q( z ) 

p(z) 


(13) 


This optimal distribution p* is called the I-projection of p to M.b- Csiszar (1975) proposed a 
criterion for characterizing /-projections, p* is called the /-projection of p on M. b if and only 
if 

D(q\\p)> D(q\\p*) +D(p*\\p) q E M B - (14) 

One may check that p* as defined above by (11) satisfies 0- Indeed, the following equality 

D(q\\p) =^2q(z B )D(q(- Bt \B\-B)\\p(-B t \B\-B)) + D (q(- B )\p(- B )) (15) 

always holds. Moreover, Eq. ([14]) is satisfied by p* as an equality. One can readily see that 


E < l( z B)D(q(-B i ,\B\ 

7 ~D 

•b)||p(-b,\b|-b)) - D{q\\p*); 

(16) 

**t> 

D(q(- B )\\p(-B )) = D{p*\\p). 

(17) 


A proof of the equalities is given in Appendix A. The estimated distribution on the pattern 
5*, that integrates the known marginal distributions on 5 E B is further used ot estimate the 
truncation map parameters. The BME estimate on 5* is assumed to be the correct data in 
this following estimation. 














2.1.3 Truncation map goodness of fit 


The model parameters 6 should be estimated based on their prior distribution and the data. 
The prior distribution of the parameters of a truncated bigaussian model based on Voronoi 
tessellation were given as distributions on the number of nodes, the locations of nodes, and the 
colors of nodes (see discussion in Sect,2.1.1). The search for parameters of the truncation map 
should result in realizations of truncated bigaussians with approximately the same probability 
of occurrence of zb , as given by the BME approach. Notice, that the correlations of the latent 
random Gaussian vectors on B * are assumed to be known. Samples with zero-mean and known 
correlation matrices of the latent Gaussian realizations on B t are denoted xv /, ••• ,Xa and 


,(«■) 


B* ? 5 

y' B ', respectively for some n <S N. The categorical relative frequency from the mapped 
sample is denoted 


Vb] 


K z B,,0) = ^{Me(xQ,y%l) = z B ., r = l,...,n}\ (18) 

where | • | denotes the cardinality of the set. 

The mismatch between the pmf p* and the observed frequency can be quantified using a 
divergence function such as the Kullback-Leibler divergence 

= £/(**.,(>) ln§^y- (19) 


The Kullback-Leibler divergence is non-negative, taking the value zero only when the distri¬ 
bution of samples is equal to the expected distribution. The Kullback-Leibler divergence is 
seen above to have a close relation to the BME estimates Eq. (12), although other divergence 
functions (Bregman, 1967) with similar properties could be used instead. It is assumed that 
whenever a facies included in the pattern realizations zb * is not presented in the truncation 
map presented by the model parameters 6, the mismatch is infinitely large. This consequently 
gives the estimates with each facies assigned to at least one cell almost surely. 


2.1.4 Simulated annealing algorithm 


The problem is set as maximum likelihood estimation represented by the mismatch function 
F(0) in (19) above with negative sign. Parameters that minimize the mismatch function are es¬ 
timated by means of the simulated annealing algorithm. The target distribution at temperature 
T can be written as 

£(%*) = CCT)exp(-Th), 


where temperature parameter T changes with iterations, and C(T) is an unknown normalization 
constant, dependent on T. The temperature cooling schedule T (n) , where n is the iteration 
number is chosen to ensure that the algorithm does not get stuck in a local minimum at the 
early iterations. The temperature is typically allowed to cool slowly, for example, T (n> = T f0, o r ' 
for a < 1. The earlier period of the iterative process have the larger temperature values T to 
better explore the 6 space. The cooling period of the iterative process corresponding to the 
smaller values of T would reach a local minimum of the mismatch. This corresponds to a local 
maximum of the goodness of fit of the truncation map Mg to the the data expressed as the 
BME estimate. The candidate-generating distribution in the iterative process is based on the 
prior model in order to get an estimate close to the prior distribution. However, because the 
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problem is stated as maximum likelihood estimation, the estimate does not necessary respect the 
distribution of the number of nodes P /tt (-), the node colors or the node coordinates. A standard 
implementation of the simulated annealing for maximization of the function F(8), with the the 
associated temperature cooling schedule T(-), and the candidate-generating distribution q(9,9') 
is provided in Algorithm [2| 

Algorithm 2 Simulated annealing algorithm, given 9^°\ T^°\ q(-, •), = T^a n , F(-) 

(i) initialize ~9 C = 0(°); T c = T (0) 

(ii) generate 9 n ~ q(9 c , 9 n ). 

(iii) assign 9 C = 9 n with probability 


p(9 c ,9 n -T c ) = min 


exp(—F(6 > n )/r c ) 1 

exp(-F(9 c )/T c ) ’ j ‘ 


( 20 ) 


(iv) put T c = aT c and goto (ii). 


by 


If F is the objective (target) density and the acceptance probability p{9\, 6b, T) is replaced 


1 } \e x p(-F(#,)) 9 (9 2 |#i)’ 1 /’ 


( 21 ) 


which comes from setting T = 1 and accounting for the probability of proposing a move, this 
gives Metropolis-Hastings algorithm for sampling the likelihood. 

The candidate-generating distribution q based on the prior information on the model pa¬ 
rameters is created as following. The truncation map is parametrized by a colored Voronoi 
tessellation. The prior distribution is conveniently specified in terms of a probability distribu¬ 
tion of the number of nodes, distribution of the coordinates and the colors of the nodes. The 
proposals for the Markov chain includes birth and death events. The proposed number of nodes 
u n , given the current number of nodes u c , is sampled from {y c — 1, u c , u c + 1}, with probabilities 
proportional to ( u c — 1), P /( (/y : ) and P /( ( u c +1) respectively, where P ;i (•) is the Poisson density 
with parameter p. Depending on the outcome of the previous step, the proposal truncation 
map is sampled as following. In a birth event {y n = u c + 1) the location of a new node is sam¬ 
pled at random according to the prior distribution and added to the current nodes. When the 
number of nodes stays the same (y c = u c ), an existing node is chosen uniformly and resampled 
according to the prior distribution. In a death event (y n = v c — 1), one of the existing nodes 
is selected uniformly for removal. Any of the events above produces a new tessellation, which 
might significantly change the map if the total number of nodes is small. 


2.2 Methods: Validation 

Any estimate of the truncation map together with the probability distribution of latent Gaussian 
random vectors provide a predictive distribution of categorical realizations at every location. 
Whenever observations at locations are available, this will be called an event. A good proba¬ 
bilistic model should provide adequate probability of generating the correct categories at those 
locations. The probabilities are only indirectly assimilated as marginals of the the pattern pmf 
and the bivariate pmf. One one hand, the observations can be poor and spatially scattered to 
give the correct empirical estimates of the transition probabilities. One the other hand, the 
categorical observations may also include joint observations with longer correlations, which are 
not reflected in the data. Moreover, the categorical observations can become available at a 
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later time. Altogether, the validation method, described below can be applied due to any of 
the mentioned reasons as well as to evaluate any assumption of the estimation method. They 
include the following: the prior distribution of the truncation map parameters related to the 
Voronoi tessellation and comparison to different types of parametrization; hypotheses on the 
latent variable probability distribution, including covariance matrices; choice of the cooling 
schedule or the form of the goodness-of-fit function. In this work, the quality of a probabilistic 
prediction will be measured using scoring rules. A scoring rule is a measure of the quality 
of a probabilistic prediction. It is said to be proper if the expectation of the scoring rule is 
maximized when the observations are drawn from the prediction distribution. The scoring rule 
of the TPG model parameters will be evaluated based on an ensemble of category predictions 
that materialize from a probabilistic model. In general, the comparison can only be made with 
observations or samples from the true distribution. However, two predictive distributions can 
be compared given the materialized event. 

For an event that constitutes of a unique observation at one location, a variety of scoring 
rules are available (Gneiting and Raftery, 2007). The sample space of this event is equal to 


the set of categories C for the model introduced previously. Let the probabilistic forecast be 
given as a pmf p(-) on C. Then, one example of a scoring rule of p given an event c G (7 is a 
logarithmic score, 

S(p, c) = logp(c), (22) 

Conversely, for a vector of observations one might use so-called scoring rules of the model 
parameters given the unordered data. For the sake of simple notation the model parameters, 
denoted 9, includes both truncation map parameter values, Eq.([2]) and the statistics of the latent 
Gaussian random vectors distributions, which govern together the predictive distribution. A 
materialized event is denoted as Zo = ( z^d G D ) and is an observation vector at the locations 
D C A. An example of a related scoring rule based on the logarithmic score (Gneiting and 


Raftery, 2007, Eq. (55)) for the model parameters 9 , given this event, is 


Se — E(p\d )[log P{Z,i — Zd\Z^£)\d) — Z(£>\d),9)]. 


(23) 


deD 


Here at every location d a logarithmic score is taken for the observation Zd conditioned to the 
observations V(o\d) at locations ( D\d ). The set of locations ( D\d ) is a random subset of all 
observations D excluding d, ( D\d ) ~ U(V(D\d)). The expectation is taken over ( D\d ), and 
those values are summed up for all locations d G D. In practice, one should approximate 
the probabilistic forecast for each event Zd = Zd\Z(o\d) = z (D\d) , 9 of a categorical random 
variable Zd at location d given the observations z^D\d) at ( D\d ), and the model parameters 
9. The general steps of the simulation of a categorical observation Zd conditional to the set 
of categorical observations Z(o\d) are included in Algorithm |3j The problem of conditioning 
random categorical fields produced from truncated Gaussian or plurigaussian random vector to 
several categorical observations is often necessary beyond the application to the validation with 
the scoring rules. While the second and the third steps are straightforward, the next section is 
dedicated to the first step, and the sampling method used particularly for the truncation map 
in form of a colored Voronoi tessellation. 


2.3 Methods: Conditional simulation 

A common practical tool for unconditional Gaussian simulation is the Gibbs sampler. The Gibbs 
sampler has also been used to simulate latent variables conditioned to categorical observations 
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Algorithm 3 Simulation of z c i conditional to ^(D\d) = z {D\d) 

(i) Generate 


(X(D\d),y(D\d)) ~ -A/"(o, 


c 


X (D\d) 

0 


c 


Y, 


( D\d) J 


1 0, Mg(x( D \ d ),y( D \d)) — z (D\d)), (24) 


where 0 is a zero-vector of proper length, C 


X (D\d) 


and C 


Y(D\d) 


are the covariance matrices of 


X(D\d) and Y(D\d), respectively. Simulation of (x(D\d) and y(D\d) should be done simultane¬ 
ously due to the conditioning. This step represents itself an iterative procedure, such as the 


constrained Gibbs sampler or its alternative form described in Sect. |2.3| below. 

(ii) Generate Gaussian observation (, y r [) at the location d conditional to the vector of Gaus¬ 
sian observations at the locations ( D\d ), that is, simulate 


x d ~ -A/"(0, l|X(£>\d) — £(.D\d)), yd ~ N (0, l\Y {D \ d) — y(D\d)) , 


(25) 


where simulation of x^, and y ( j can be done separately due to the independence assumption. A 
standard method for conditional Gaussian simulation is to use kriging estimates for the mean 
and variance. 

(iii) Produce a categorical observation at d using the mapping M^x^yd) = z d- 


and governed by a bigaussian truncated model, despite the difficulty to compute or find the 


rate of convergence experimentally (Armstrong et al., 2011). Lantuejoul and Desassis (2012) 


proposed another iterative version of the unconditional simulation, known as the propagative 
version of the Gibbs sampler. This sampler avoids problems with the use of the moving search 
method and reduces the need for covariance matrix inversion. The main idea is to operate 
the Gibbs sampler with x~a = C x \ xa instead of the direct updating of the realization xa of a 

random vector with zero-mean and the covariance matrix Cx A ■ The initial vector x^ in the 
approach has all the same entries (e.g., zero-vector). Then at every update one index a G A is 
chosen at random in such a way that all the indices are selected infinitely often as the number 
of iterations tends to infinity. For a given index (3 E A (pivot), an update of x ^~ 1 > at state 
k — 1 to state x A J for every component takes the form 


x 


(k) 


— x cv ^ A ( — x^o + u^)Cx A (<T /3), Vcr G A, 


(26) 


where Cx a (<y,/3) is an element (a, f3) of the covariance matrix C\ A of Xa] 'Y k) is a standard 
Gaussian random realization independent of Xa- An adaptation to the case of the conditional 
TPG simulation based on the propagative version of the Gibbs sampler was given by [Emery 


et al. (2014). The author used the classical truncation map parametrization (Armstrong et al. 


2011), where the problem of conditioning to categorical observations was equivalent to the 
problem of simulation of a GRF realization xa with correlation matrix Cx A , subject to linear 
inequality constraints 


T 


< r < T 

— ^OL _ cn 


Xa € [—oo,oo), I a G (—co,co], a € A. 


(27) 


The relation of the linear inequality conditioning on x was translated to conditioning on u, and 
the initial state :r (0) was chosen to be conditional to the categorical data. This work extends 
the idea for the case of conditional Gaussian simulation based on truncation map with the new 
parametrization in form of colored Voronoi tessellation representing the truncation map. The 
main usefulness of the method consists in the improved mixing quality. 
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Conditioning to a categorical observation z a within the model can be seen as conditioning 
of a pair of Gaussian realizations (x a ,y a ), where the two-dimensional point (x a ,y a ) belongs to 
the intersection of a union of polygons. Due to possible triangulation of each cell of the Voronoi 
tessellation, one can write this subset as a union of triangles A ( cl 2 ,lG T(z a ) with pairwise 
disjoint interiors. An example of a triangulation of the truncation map used in Fig. [I] is shown 
in Fig. [3| In order to triangulate only bounded Voronoi cells, a few additional nodes with large 
absolute coordinate values are added. Thus, some triangle edges are cut by the domain bounded 
by the practical range of the bivariate Gaussian distribution. Adding external nodes does not 
change the tessellation inside this domain. The truncation map is reproduced in Fig. [3^a) for 
visualization purpose. 




Figure 3: Example of triangulation of a truncation map based on colored Voronoi tessellation 


The equations for GRFs (xa, !Ja ) of the propagative version of the Gibbs sampler for a given 
pivot 0 take form similar to Eq. ( p6| ). Written explicitly for every element a E A, this gives 
the following. 


4‘> =4‘-” + (-a# -1 * +uM)C XA (a,P) 

£t) = y it-D + + v W)Cy A (a, P), 


(28) 


where Cx A (a, 0) and Cy A (a, 0) are the elements ( a , /3) of the covariance matrices C\ A and Cy A 
respectively. u (k> and v (k ' ) are standard Gaussian random variables, independent of Xa and Ya- 
Given the current states x^ k ~ x \ y^ k ~ x \ the updates in Eq. (28) are functions of and v^ k \ 




(29) 


is an affine transformation. Thus to update all (x a , y Q ) E (J A t , a E A at a given state 

tGT(z<x) 

k given the pivot 0 one should simulate 


(«<*>,!><*>)£ PI U 4;i(A t ), 

a^A t€T(z a ) 


(30) 


in order to satisfy the constraints. The sampling domain for can also be represented 

as a union of triangles with zero-probability of mutual intersections. Every iteration of the 
conditional simulation is straightforward from bivariate Gaussian simulation in a triangle and 
the cumulative distribution function over a triangle. The later is computed based on Zelen 


and Severo (2012, Eq. 26.3.23 and Fig. 26.7-26.10). Then a sampling of the two variables 
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within a triangle is performed using a few iterations of the Gibbs sampler. The alternative 
sampler performs best when starting with several iterations of standard Gibbs to get consid¬ 
erable improvement from the initial state sampled as uncorrelated observations conditional to 
categories, then alternating standard Gibbs sampler and the alternative Gibbs sampler every 
iteration. The first one is still used for fast improvement of correlations while the second one 
allows to better explore the probable states through simultaneous updates of the entire latent 
vectors. It has been found that a good initialization similar to the zero-state of the uncon¬ 
ditional simulation, is to assign the same pairs of values to the pair of elements of the state 
Va G A corresponding to the same categorical observation z a = z G C, that is, 

(^ 0) »2/i 0) ) = &y)(z) V« E A I z a = z = Mg((x,y)(z)), (31) 

where the pairs of values (x,y)(z) for z G C are chosen arbitrarily. 

One scan of the sampler includes the updates of (xa, Va ), given every index /? G A as pivot, 
where the order of choosing (3 G A is random. One iteration of the sampler in the application 
to this work included one scan of a standard Gibbs sampler, and one scan of the conditional 
propagative version, which together provided fast convergence. Algorithm [4] shows the steps of 
the iterative procedure. 


Algorithm 4 Conditional simulation of the truncated bivariate model 

(i) Set k = 0; 

(ii) initialize (xaiVa)^ according to Eq. (31); 

Comment: propagative version scan 

(iii) select a random path of indices (3 G A; 

(iii.a) for a selected (3 E A simulate {u^ k \v^) according to Eq. (30); 

(iii.b) update (xa\ya^) Va G A according to Eq. (28); 

Comment: standard Gibbs sampler scan 

(iv) select a random path of indices a G A; 

(iv.a) for a selected a G A update 


1 0 
0 1 


(4Gj/<' ,| )~a/'((o,o), 

(v) set k = k + 1 and goto (iii) 


x. ,4\a = 4C = Ab 0 . M e (xW,yl ,*>) = *„), (32) 


.(*) 


3 Application 

The methodology for estimating a truncation map from a realization of the categorical variables 
is illustrated with the following example. A synthetic categorical field Fig. [4|a) is generated 
from the truncation map in Fig. id) and from realizations of two latent Gaussian vectors 
(Fig-gb, c)), each with a Gaussian covariance matrix with scale parameter equal to 10. 

3.1 Synthetic transition data 

The transition probabilities that are used as data are derived from the synthetic categorical 
realization. For visualization, the transition probability for categorical realizations (z ai , z a2 ), G 
C (i) 2 , C = (1,2, 3,4}, (oq, Of 2 ) = B G Bh,h G H = {(0,1), (1,0)} are presented by areas of the 
circles with centers at the coordinates (z ai , z a2 ) in Fig. [5j 
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(a) Synthetic field 


(b) Gaussian realization xa (c) Gaussian realization y ,\ 


(d) Truncation map 


Figure 4: The observations come from (a) which is created with the GRFs (b,c) and the 
truncation map (d) 
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(a) (z ai , z a2 ) £ C 2 , (ai,at 2 ) — B £ (b) (z ai , z a2 ) £ C 2 , (ai, 012 ) — B £ fi(o,i) 

Figure 5: The area of circles at (z ai , z a2 ) e C 2 , C = {1, 2, 3,4} is proportional to 7 t(zb) 


3.2 Estimation results 



Iteration 0 3000 6000 9000 


Figure 6: States of the simulated annealing 


The same covariance matrices are used as in the synthetic realization Fig. |4](b,c) for the 
empirical transition probabilities. The states of the simulated annealing are shown at iteration 
0, 3000, 6000, 9000, in Fig. [6] for two Markov chains with different initial states. Despite the 
large difference in the initial states, the states of the truncation maps are similar after 9000 
iterations. 

Other parameters behavior related to the estimation process are shown in Fig. HI 
ures [7](a,b) show the likelihood function values and the number of nodes, respectively, plotted 
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(a) Likelihood function F(9^ t ' ) ) (b) Number of nodes for 9 {t) (c) Temperature T [t) 


Figure 7: Simulated annealing behavior at iteration t = 1,, 9000. 

for iteration 1 to 9000. The temperature cooling schedule T^> = Tocd,o: = 0.9995, T 0 = 500 is 
visualized for the same range of iterations in Fig. [7|(c). The likelihood functions decline with 
the iterations to a cooled state with close likelihood values. The mismatch of the initial states 
and some first few iterations for both chains is computationally to large to be tracked. At 
early iterations all the proposals are accepted. The prior mean number of nodes equals 20 here, 
which is well reflected at the early iterations, Fig. J^b). Then the number of nodes declines, 
presumably due to the fact that the the generation of the number of nodes does not take into 
account the probabilities related to their colors. This behavior is similar for both chains. 

3.3 Validation of results 

Estimation results are validated based on a set of categorical observations at the cross-sections 
X = 10,15, 20 of the synthetic field, Fig. [8]( a), the same field that was used to generate the 
transition data. It is reproduced here for visualization. Fig. |8|b) shows the materialized event. 
The event includes three aligned columns of categorical observations resembling vertical well 
categorical data. This event is used for the validation of the predictive distributions based on 
the truncated bivariate model parameters estimated above. 

Because the same covariance matrices are used in the synthetic realization for the empirical 
transition probabilities and in the prior model, the validation does not evaluate the latent 
variables distribution assumptions, although it evaluates the results of the estimation of the 
truncation map based on likelihood to the transition probability data. The validation also 
answers the question of how appropriate was the estimation based on the bivariate probabilities 
in regard to the long correlated observation vector in this particular case. 



(a) Synthetics field (b) Event 


Figure 8: Well categorical observations (b) at the cross-sections X = 10,15, 20 of the synthetics 
field (a) 


The predictive distributions are approximated through samples conditioned to random sub- 
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sets of observations. The number of random subsets for empirical predictive distribution at each 
location equals 50. An example of the simulation conditioned on subsets of the data is given in 
Appendix B. The reference model truncation map is available in this example, Fig. [4](d) . The 
predictive distributions based on the refernce model is shown in Fig. [fijA). Empirical distribu¬ 
tions for the predictive distributions based on the estimated truncation map from each chain 
are given in Fig. [9](b,c). The visualizations of the categorical predictive distributions from the 
reference map and from the estimates should be interpreted as a portion of a color in a column 
for each observation aligned in horizontal-axis below. The columns 1 to 20 correspond to the 
well observations at the cross-section X = 10, the columns 21 to 40 correspond to the observa¬ 
tions at cross-section X = 15, and the last columns from 41 to 60 are related the observations 
at the cross-section X = 20. A relative height of the observation color (category) from the 
horizontal-axis occupies in the respective column above it represents the probability of a cor¬ 
rect prediction at the respective location. Other categories have the probability proportional 
to their heights in this column. 



0 20 40 60 0 20 40 60 0 20 40 60 

(a) Reference truncation map (b) First chain (b) Second chain 


Figure 9: Empirical predictive distributions based on random subsets of observations for the 
reference truncation map (a) and the estimated truncation maps (a,b) 

Validation of probabilistic predictions is accomplished using scoring rules. If the value of 
the scoring rule is relatively small, the simulated annealing parameters can be assumed to be 
set correctly, for example, the temperature cooling schedule was slow enough. In this synthetic 
example, for which the true model parameters are known, it is possible to compare the scoring 
rule for probabilistic forecasts obtained using parameters from the reference model with the 
scoring rule computed for probabilistic forecasts obtained using parameters 0 from the estimated 
maximum likelihood model. The respective scoring rules for the three predictive distributions, 
approximating Eq. ((23]) are S ief = -23.214; S lst chain = -22.299; S 2n d chain = -21.615. The 
predictive power of three distribution is similar. The predictive distribution from the reference 
truncation map in Fig. j9]a) overestimates the ‘orange’ category, the predictive distribution 
from the estimated truncation map (first chain), Fig. (9](b) overestimates the ‘blue’ category. 
The predictive distribution from the estimated truncation map (second chain), Fig. |9](c) is close 
to the distribution given the first chain estimate, both visually and based on the scoring rule 
values. Overall, the validation shows that the estimated truncation maps are comparable to 
the reference truncation map, despite the small number of samples. 

4 Conclusions 

One of the challenges of using the TPG is the problem of estimating parameters of the trun¬ 
cation map that will generate realizations of categorical vectors with the desired distribution. 
This paper proposed an approach for estimation of a truncation map using a Voronoi tessel¬ 
lation to define regions of the truncation map corresponding to different categorical variables. 
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The categorical bivariate unit-lag distribution was assumed to be known from geological data. 
Estimation of parameters of the truncation map was performed using simulated annealing to 
minimize an objective function measuring the difference between a joint distributions from the 
BME model used to integrate the unit-lag transition probabilities in different directions, and 
the respective empirical distribution obtained from sampling with the truncation map. The 
covariance matrices of the latent GRFs were assumed to be known during the optimization. 
Parameters that were estimated included the number of Voronoi nodes, their locations and the 
categories assigned to each node. The validation of the result was performed by the scoring 
rules computation based on the unordered data from synthetic categorical joint observation as 
an event. An efficient method of conditional simulation for the defined model, was based on 
the propagative version of the Gibbs sampler. In particular, the simulation method was used in 
the validation implementation for conditioning to a number of observations with strong correla¬ 
tions, and exhibited good mixing properties. The validation of the estimates of the truncation 
map based on the scoring rules has been successful with the predictive power of the estimated 
model similar to each other and to the truncation map used for creating the synthetic field. 
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Appendix A: Proof of Csiszar criterion 

A proof of the first equality: 

D (q\\P*) = = J] q(z B )q(z BAB \z B ) In 

q( z b*\b\ z b) 


Y q ^ Y 9( z bab\ z b)^ 


zb 


B*\B 


z *( z B*\ B \ z b) 


- Yj <l( Z B)D(q(-B t \B\-B)\\p(-B*\B\-B))- 

ZB 

A proof of the second equality: 

p*(z) 


D(p*\\p) = YP *( z ) ln 

Z 

= £,(*,) In 


p(z) 

7 t{z B ) 


-- Yj 7 T ( z b)p( z b,\b\ z b) In 

Z 

= D(p*(- B ) ||p(-b)). 
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p( z b) 
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(a) 


(b) 


(c) 



Figure B.l: (a) Conditioning to the random subset of observations, (b) mixing of the latent 
Gaussian vector realizations during 200 iterations, (c) conditional simulation on the grid 


Appendix B: Conditional simulation example 

This part provides an example of conditional simulation based on a random subset of the 
observations in the event, Fig. [8| and the estimated truncation map Fig. [6] (first chain). To 
approximate a member of the sum for the scoring rule in Eq. (23), multiple random subsets 


should be sampled, and the Gaussian vectors, Eq. (24) should be simulated as one of the steps. 


An example is related to the member of the sum for the first observation of the event with the 
coordinates (10,1). Figure B.l[ a) shows the unordered subset of the observations. The latent 
Gaussian vectors with the related coordinates and the covariance matrix, is conditioned to these 
categorical observations according to the method described. 200 iterations of both the standard 
and alternative Gibbs sampler were used to ensure that the Markov chain for the Gaussian 
realizations have mixed Fig. B.l| (b). The vertical axes represents the sum of the logarithms of 
the probability densities of the respective latent Gaussian random vector. After the conditional 
Gaussian realizations at the related coordinates are obtained, the Gaussian simulation allows 
to sample and map the categorical realizations at the coordinate (10,1) (crossed square). For 
the sake of the example, the sampling is done in the entire grid, Fig. |B.lf c). In this realization 
example the simulated categorical observation at (10,1) is predicted correctly. 
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